In this analysis, we aim to explore and visualize the top songs based on their streaming numbers across different platforms. We will particularly focus on identifying the top 10 songs released in 2023 and comparing them with the overall top 10 songs by streams. This analysis is crucial for understanding trends in music popularity and how recent releases stack up against all-time hits.
First, we load the necessary libraries. We use readr for
reading the dataset, dplyr for data manipulation,
ggplot2 for visualization, and plotly for
creating interactive plots.
We filter the dataset to select the top 10 songs released in the year 2023. This helps us focus on the most popular recent releases.
top_10_songs_released_2023 <- data %>%
filter(released_year == 2023) %>%
arrange(desc(streams)) %>%
head(10) %>%
select(track_name, artist_name, streams)
top_10_songs_released_2023
## track_name artist_name streams
## 1 Flowers Miley Cyrus 1316855716
## 2 Ella Baila Sola Eslabon Armado, Peso Pluma 725980112
## 3 Shakira: Bzrp Music Sessions, Vol. 53 Shakira, Bizarrap 721975598
## 4 TQG Karol G, Shakira 618990393
## 5 La Bebe - Remix Peso Pluma, Yng Lvcas 553634067
## 6 Die For You - Remix Ariana Grande, The Weeknd 518745108
## 7 un x100to Bad Bunny, Grupo Frontera 505671438
## 8 Cupid - Twin Ver. Fifty Fifty 496795686
## 9 PRC Natanael Cano, Peso Pluma 436027885
## 10 OMG NewJeans 430977451
Here, we select the overall top 10 songs by their number of streams. This gives us an insight into the most popular songs regardless of their release year.
top_10_songs_2023 <- data %>%
arrange(desc(streams)) %>%
head(10) %>%
select(track_name, released_year, artist_name, streams)
top_10_songs_2023
## track_name released_year
## 1 Blinding Lights 2019
## 2 Shape of You 2017
## 3 Someone You Loved 2018
## 4 Dance Monkey 2019
## 5 Sunflower - Spider-Man: Into the Spider-Verse 2018
## 6 One Dance 2016
## 7 STAY (with Justin Bieber) 2021
## 8 Believer 2017
## 9 Closer 2016
## 10 Starboy 2016
## artist_name streams
## 1 The Weeknd 3703895074
## 2 Ed Sheeran 3562543890
## 3 Lewis Capaldi 2887241814
## 4 Tones and I 2864791672
## 5 Post Malone, Swae Lee 2808096550
## 6 Drake, WizKid, Kyla 2713922350
## 7 Justin Bieber, The Kid Laroi 2665343922
## 8 Imagine Dragons 2594040133
## 9 The Chainsmokers, Halsey 2591224264
## 10 The Weeknd, Daft Punk 2565529693
We combine the two datasets—top 10 songs released in 2023 and overall top 10 songs. This allows us to compare the two sets of songs in a single visualization. We use ggplot2 to create a base plot and then convert it to an interactive plot using plotly. The plot shows the top 10 songs by streams, distinguishing between songs released in 2023 and the overall top 10 songs. Hovering over a point displays the song name and artist name.
top_10_songs_released_2023$type <- "Released in 2023"
top_10_songs_2023$type <- "Top 10 Overall"
combined_data <- bind_rows(top_10_songs_released_2023, top_10_songs_2023)
combined_data <- combined_data %>%
group_by(type) %>%
mutate(position = row_number())
p <- ggplot(combined_data, aes(x = position, y = streams/100000, group = type, color = type, linetype = type, text = paste("Track Name:", track_name, "<br>Artist:", artist_name))) +
geom_line(size = 1.5) +
geom_point(size = 3) +
scale_x_continuous() +
labs(title = "Top 10 Songs by Streams",
x = "Track Position",
y = "Number of Streams",
color = "Category",
linetype = "Category") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplotly(p, tooltip = "text")
This line plot visualizes the streaming numbers of the top 10 songs in 2023 compared to the overall top 10 songs. The interactive plot allows for an in-depth comparison by showing detailed information on hover.
Analyze and Compare Top and Next 10 Songs We then compare the top 10 songs with the next 10 songs in terms of streaming numbers and audio features. We use ggplot2 to create bar plots and box plots for visualization.
top_10_songs <- data %>%
arrange(desc(streams)) %>%
head(10)
# Filter for next 10 songs
next_10_songs <- data %>%
arrange(desc(streams)) %>%
slice(11:20)
# Combine the two datasets
top_10_songs$type <- "Top 10"
next_10_songs$type <- "Next 10"
combined_data <- bind_rows(top_10_songs, next_10_songs)
# Plot streaming numbers
p_streams <- ggplot(combined_data, aes(x = reorder(track_name, -streams), y = streams/1000000, fill = type)) +
geom_bar(stat = "identity", position = position_dodge()) +
labs(title = "Streaming Numbers for Top 10 and Next 10 Songs",
x = "Track Name",
y = "Number of Streams (in millions)",
fill = "Category") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Plot audio features
audio_features <- combined_data %>%
select(track_name, type, danceability_ = danceability_., valence_ = valence_., energy_ = energy_., acousticness_ = acousticness_., instrumentalness_ = instrumentalness_., liveness_ = liveness_., speechiness_ = speechiness_.) %>%
gather(feature, value, -track_name, -type)
p_features <- ggplot(audio_features, aes(x = feature, y = value, fill = type)) +
geom_boxplot() +
labs(title = "Audio Features Comparison for Top 10 and Next 10 Songs",
x = "Audio Feature",
y = "Value",
fill = "Category") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Convert to interactive plots
ggplotly(p_streams)
ggplotly(p_features)
The bar plot displays the streaming numbers for the top 10 and next 10 songs, while the box plot compares key audio features between these two groups. Both plots are converted to interactive formats for better analysis.
Feedback: Roommates suggested that adding labels to the lines in the plot could help differentiate between categories more clearly. They also recommended adding explanations for why certain audio features might be relevant to streaming numbers and suggesting a comparison of audio feature distributions.